Diabetic retinopathy (DR) is a complication of diabetes, and one of the major causes of vision impairment in the global population. As the early-stage manifestation of DR is usually very mild and hard to detect, an accurate diagnosis via eye-screening is clinically important to prevent vision loss at later stages. In this work, we propose an ensemble method to automatically grade DR using ultra-wide optical coherence tomography angiography (UW-OCTA) images available from Diabetic Retinopathy Analysis Challenge (DRAC) 2022. First, we adopt the state-of-the-art classification networks, i.e., ResNet, DenseNet, EfficientNet, and VGG, and train them to grade UW-OCTA images with different splits of the available dataset. Ultimately, we obtain 25 models, of which, the top 16 models are selected and ensembled to generate the final predictions. During the training process, we also investigate the multi-task learning strategy, and add an auxiliary classification task, the Image Quality Assessment, to improve the model performance. Our final ensemble model achieved a quadratic weighted kappa (QWK) of 0.9346 and an Area Under Curve (AUC) of 0.9766 on the internal testing dataset, and the QWK of 0.839 and the AUC of 0.8978 on the DRAC challenge testing dataset.
translated by 谷歌翻译
Cross-view geo-localization aims to spot images of the same location shot from two platforms, e.g., the drone platform and the satellite platform. Existing methods usually focus on optimizing the distance between one embedding with others in the feature space, while neglecting the redundancy of the embedding itself. In this paper, we argue that the low redundancy is also of importance, which motivates the model to mine more diverse patterns. To verify this point, we introduce a simple yet effective regularization, i.e., Dynamic Weighted Decorrelation Regularization (DWDR), to explicitly encourage networks to learn independent embedding channels. As the name implies, DWDR regresses the embedding correlation coefficient matrix to a sparse matrix, i.e., the identity matrix, with dynamic weights. The dynamic weights are applied to focus on still correlated channels during training. Besides, we propose a cross-view symmetric sampling strategy, which keeps the example balance between different platforms. Albeit simple, the proposed method has achieved competitive results on three large-scale benchmarks, i.e., University-1652, CVUSA and CVACT. Moreover, under the harsh circumstance, e.g., the extremely short feature of 64 dimensions, the proposed method surpasses the baseline model by a clear margin.
translated by 谷歌翻译
Twitter机器人检测已成为打击错误信息,促进社交媒体节制并保持在线话语的完整性的越来越重要的任务。最先进的机器人检测方法通常利用Twitter网络的图形结构,在面对传统方法无法检测到的新型Twitter机器人时,它们表现出令人鼓舞的性能。但是,现有的Twitter机器人检测数据集很少是基于图形的,即使这些基于图形的数据集也遭受有限的数据集量表,不完整的图形结构以及低注释质量。实际上,缺乏解决这些问题的大规模基于图的Twitter机器人检测基准,严重阻碍了基于图形的机器人检测方法的开发和评估。在本文中,我们提出了Twibot-22,这是一个综合基于图的Twitter机器人检测基准,它显示了迄今为止最大的数据集,在Twitter网络上提供了多元化的实体和关系,并且与现有数据集相比具有更好的注释质量。此外,我们重新实施35代表性的Twitter机器人检测基线,并在包括Twibot-22在内的9个数据集上进行评估,以促进对模型性能和对研究进度的整体了解的公平比较。为了促进进一步的研究,我们将所有实施的代码和数据集巩固到Twibot-22评估框架中,研究人员可以在其中始终如一地评估新的模型和数据集。 Twibot-22 Twitter机器人检测基准和评估框架可在https://twibot22.github.io/上公开获得。
translated by 谷歌翻译
当前相机图像和信号处理管道(ISP)(ISP),包括深度训练的版本,倾向于应用一个均匀地应用于整个图像的单个过滤器。尽管大多数获取的相机图像具有空间异构的伪影。这种空间异质性在图像空间中表现为各种莫尔振铃,运动模糊,颜色漂白或透镜的投影失真。此外,这些图像伪像的组合可以存在于所获取的图像中的小或大像素邻域中。这里,我们介绍了一种在学习的潜在子空间中工作的深度加强学习模型,通过基于补丁的空间自适应伪影滤波和图像增强来递归地改善相机图像质量。我们的RSE-RL模型视图识别和纠正作为递归自学习和自我改善练习,并由两个主要子模块组成:(i)通过等级变分自动获得的潜在特征子空间聚类/分组-Encoder能够快速识别嘈杂和清洁图像补丁之间的对应和差异。 (ii)由信任区域软演员 - 批评代理控制的自适应学习转换,逐步过滤并使用其最接近的清洁补丁的特征距离邻居增强嘈杂的补丁。可以在基于贴剂的ISP中引入的人工伪影,也通过基于奖励的去阻断恢复和图像增强来消除。我们通过在图像上进行递归训练和测试来展示我们模型的自我改善特征,其中每个时代产生的增强图像为RSE-RL训练过滤管道提供自然数据增强和鲁棒性。
translated by 谷歌翻译
相机图像信号处理(ISP)管道,包括深度学习训练的版本,可以在不同的图像信号处理任务中获得吸引力的结果。但是,大多数这些方法(如果不是全部)倾向于在整个图像上应用单个滤波器。当对任务训练编码器类型的深度体系结构时,也尤其如此。但是,自然要将摄像机图像视为异质,因为即使在单个图像的两个维度域上,颜色强度和人造噪声也大不相同。多样化的Moire响起,运动裂,颜色射或基于镜头的投影失真都可能导致异质图像伪影滤波问题。在本文中,我们提出了一个基于贴片的本地子空间深神经网络,该网络可改善相机ISP对异质伪像(尤其是图像denoising)具有稳健性。我们称我们的三倍训练的模型为补丁子空间学习自动编码器(PSL-AE)。 PSL-AE不一定假设图像失真级别,也不重复或相似的伪影类型。相反,PSL-AE首先诊断编码从嘈杂和干净的图像对提取的斑块,具有不同的人工类型和失真级别,相比之下。然后,使用先前的混合模型将每个图像的贴片编码为适当的潜在子空间的软群。最后,PSL-AE的解码器还以针对每个软群集中图像贴片的无监督方式进行训练。我们的实验结果表明,通过合成的伪影,又是现实的SIDD图像对,通过改进的异质过滤可以实现的灵活性和性能。
translated by 谷歌翻译
This paper provides a pair similarity optimization viewpoint on deep feature learning, aiming to maximize the within-class similarity s p and minimize the between-class similarity s n . We find a majority of loss functions, including the triplet loss and the softmax cross-entropy loss, embed s n and s p into similarity pairs and seek to reduce (s n − s p ). Such an optimization manner is inflexible, because the penalty strength on every single similarity score is restricted to be equal. Our intuition is that if a similarity score deviates far from the optimum, it should be emphasized. To this end, we simply re-weight each similarity to highlight the less-optimized similarity scores. It results in a Circle loss, which is named due to its circular decision boundary. The Circle loss has a unified formula for two elemental deep feature learning paradigms, i.e., learning with class-level labels and pair-wise labels. Analytically, we show that the Circle loss offers a more flexible optimization approach towards a more definite convergence target, compared with the loss functions optimizing (s n − s p ). Experimentally, we demonstrate the superiority of the Circle loss on a variety of deep feature learning tasks. On face recognition, person re-identification, as well as several finegrained image retrieval datasets, the achieved performance is on par with the state of the art.
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
Dataset distillation has emerged as a prominent technique to improve data efficiency when training machine learning models. It encapsulates the knowledge from a large dataset into a smaller synthetic dataset. A model trained on this smaller distilled dataset can attain comparable performance to a model trained on the original training dataset. However, the existing dataset distillation techniques mainly aim at achieving the best trade-off between resource usage efficiency and model utility. The security risks stemming from them have not been explored. This study performs the first backdoor attack against the models trained on the data distilled by dataset distillation models in the image domain. Concretely, we inject triggers into the synthetic data during the distillation procedure rather than during the model training stage, where all previous attacks are performed. We propose two types of backdoor attacks, namely NAIVEATTACK and DOORPING. NAIVEATTACK simply adds triggers to the raw data at the initial distillation phase, while DOORPING iteratively updates the triggers during the entire distillation procedure. We conduct extensive evaluations on multiple datasets, architectures, and dataset distillation techniques. Empirical evaluation shows that NAIVEATTACK achieves decent attack success rate (ASR) scores in some cases, while DOORPING reaches higher ASR scores (close to 1.0) in all cases. Furthermore, we conduct a comprehensive ablation study to analyze the factors that may affect the attack performance. Finally, we evaluate multiple defense mechanisms against our backdoor attacks and show that our attacks can practically circumvent these defense mechanisms.
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
As one of the prevalent methods to achieve automation systems, Imitation Learning (IL) presents a promising performance in a wide range of domains. However, despite the considerable improvement in policy performance, the corresponding research on the explainability of IL models is still limited. Inspired by the recent approaches in explainable artificial intelligence methods, we proposed a model-agnostic explaining framework for IL models called R2RISE. R2RISE aims to explain the overall policy performance with respect to the frames in demonstrations. It iteratively retrains the black-box IL model from the randomized masked demonstrations and uses the conventional evaluation outcome environment returns as the coefficient to build an importance map. We also conducted experiments to investigate three major questions concerning frames' importance equality, the effectiveness of the importance map, and connections between importance maps from different IL models. The result shows that R2RISE successfully distinguishes important frames from the demonstrations.
translated by 谷歌翻译